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Using error diagrams, we quantify the forecasting of cliaracteristic-eartliquake occurrence in a 
, recently introduced minimalist model. Initially we connect the earthquake alarm at a fixed time after 

. the ocurrence of a characteristic event. The evaluation of this strategy leads to a one-dimensional 

' numerical exploration of the loss function. This first strategy is then refined by considering a 

fSJ I classification of the seismic cycles of the model according to the presence, or not, of some factors 

related to the seismicity observed in the cycle. These factors, statistically speaking, enlarge or 
shorten the length of the cycles. The independent evaluation of the impact of these factors in the 
forecast process leads to two-dimensional numerical explorations. Finally, and as a third gradual step 
, in the process of refinement, we combine these factors leading to a three-dimensional exploration. 

fT^ ' The final improvement in the loss function is about 8.5%. 
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I. INTRODUCTION 



The earthquake process in seismic faults is a very complex natural phenomenon that present geophysics, in spite 
of its considerable efforts, has not yet been able to put into a sound and satisfactory status. However, in the crucial 
field of earthquake prediction, recent years have witnessed significant advances. For recent thorough reviews dealing 
Q with this issue, see Keilis-Borok (2002); Keilis-Borok and Soloviev (2002), and references therein, in particular chapter 
' iouT by Kossobokov and Shebalin. See also Lomnitz (1994). The introduction of new concepts coming from modern 
statistical physics seems to add some light and put some order into the intrinsic complexity of the lithosphere and its 
r*^ > dynamics. Thus, for example, references to critical phenomena, dynamical systems, hierarchical systems, fractals, self- 
O i' organized criticality and self-organized complexity are now found very frequently in geophysical literature (Turcotte, 
2000; Sornette, 2000; Gabrielov et al., 1999; Gabrielov et al., 2000). Hopefully, this conceptual framework will prove 
' its usefulness sooner better than later. 
^ , We have recently presented a simple statistical model of the cellular-automaton type which produces an earthquake 
spectrum similar to the characteristic earthquake behaviour of some seismic faults (Vazquez-Prada et al., 2002). 
I/"} The largest earthquakes on a fault or fault segment (the events that break its complete length) are usually termed 
1 characteristic (Schwartz and Coppersmith, 1984; Wesnousky, 1994; Dahmen et al., 1998). For this reason, in the 
minimalist model the event of maximum size is called the characteristic one. Our model is inspired by the concept 
of asperity, i.e., by the presence of a particularly strong element in the system which actually controls its relaxation. 
This model presents some notable properties, some of which will be reviewed in Section^] In Section Hill an algebraic 
" , procedure for the exact calculation of the probability distribution of the time of return of the characteristic earthquake 
^ is presented. The purpose of this paper is to quantify the forecasting of the characteristic earthquake occurrence in 
this model, using seismicity functions, which are observable, but not stress functions (Ben-Zion et al., 2003), which 
are not. In Section Hvl we construct an error diagram (Molchan, 1997; Newman and Turcotte, 2002) based on the 
^ time elapsed since the occurrence of the last characteristic event. This permits a first assessment of the degree of 
rS |. predictability. In Section we propose a general strategy of classification of the seismic cycles which, adequately 
' exploited in this model, allows a refinement of the forecasts. Finally, in Section IVll we present the conclusions. 
. 

^ • II. SOME PROPERTIES OF THE MODEL 

! 

In the minimalist model (Vazquez-Prada et al., 2002), a one-dimensional vertical array of length N is considered. 
The ordered levels of the array are labelled by an integer index i that runs upwards from 1 to N. This system performs 
two basic functions: it is loaded by receiving stress particles in its various levels and unloaded by emitting groups of 
particles through the first level i = 1. These emissions that relax the system are called earthquakes. 

These two functions (loading and unloading) proceed using the following four rules: 

i In each time unit, one particle arrives at the system. 

ii All the positions in the array, from i — 1 to i — N, have the same probability of receiving the new particle. 
When a position receives a particle we say that it is occupied. 
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iii If a new particle comes to a level which is already occupied, this particle has no effect on the system. Thus, a 
given position i can only be either non-occupied when no particle has come to it, or occupied when one or more 
particles have come to it. 

iv The level i = 1 is special. When a particle goes to this first position a relaxation event occurs. Then, if all the 
successive levels from i = 1 up to i = k are occupied, and the position fc + 1 is empty, the effect of the relaxation 
(or earthquake) is to unload all the levels from i — 1 up to i = k. Hence, the size of this relaxation is k, and 
the remaining levels i > k maintain their occupancy intact. 

Therefore, the size of the earthquakes in this model range from 1 up to N, being the event oik ~ N the characteristic 
one. Note that the three first rules of this model are exactly those of the forest-fires model (Drossel and Schwabl, 
1992). Our model has no parameter and, at a given time, the state of the system is specified by stating which of 
the {i > 1) N — 1 ordered levels are occupied. Each one of these possible occupation states corresponds to a stable 
configuration of the system, and therefore the total number of configurations is 2'-'^"^). These mentioned 2'^^"^^ 
stable configurations can be considered as the states of a finite, irreducible and aperiodic Markov chain with a unique 
stationary distribution (Durrett, 1999). 

The evolution rules of the model produce an earthquake size-frequency relation, pk, that is shown in Fig.^, where 
the results for = 10, = 100, and N — 1000 are superimposed. Note that this spectrum has a distribution 
of the characteristic-earthquake type: it exhibits a power-law relationship for small events, an excess of maximal 
(characteristic) events, and very few of the intermediate size. Besides, the three superimposed curves of probability 
are coincident. 

The result for the probability of return of the characteristic earthquake, P{n), is shown in Fig.^ for N = 20. Here 
n represents the time elapsed since the last characteristic event. During an initial time interval 1 < n < N , P{n) is 
null, then it grows to a maximum and then finally declines asymptotically to 0. (In Section Hvl P{n) for N — 20, will 
be usually denoted as curve a.) In Section IlIII we explain a general algebraic method for the exact computation of 

PH. 

The configurations of the model are classified into groups according to the number of levels, j, that are occupied 
(0 < j < — 1). Using the Markov-chain theory or producing simulations (Vazquez-Prada et al., 2002), one easily 
observes that in this model the system resides often in the configurations of maximum occupancy, i. e., in j = iV — 2 
and j = N — 1. 

This last property can be observed in Fig. where we have represented, for N — 100, the time evolution of the 
level of occupancy, j, in an interval long enough to observe the occurrence of several characteristic earthquakes. The 
typical pattern after a total depletion is a gradual recovery of j up to a new high level of occupancy. Once there, 
the system typically presents a plateau before the next characteristic earthquake. Especially during the ascending 
recoveries, the level of occupancy j suffers small falls corresponding to the occurrence of rather small earthquakes, that 
in this model are abundant. Of course, one also observes that occasionally j falls in a significant way corresponding 
to the occurrence of a > fc > N/2 intermediate earthquake. 

Due to the fact that this model is not critical, it is reasonable to consider it as an example of self-organized 
complexity (Gabrielov et al., 1999). 



III. ALGEBRAIC APPROACH TO P(N) 

The function P{n), for a minimalist system of size A^, is obtained from the Markov matrix of the system, M, 
following the following three steps: i) The element of the last row and first column of M is changed by a 0. After this 
pruning, the matrix will be called M'. ii) The new matrix M' is multiplied by itself n — 1 times to obtain M'*^""^) and 
the element of the first row, last column of this matrix is identified, iii) P{n) is the product of this selected matrix 
element times 1/A^. 

The whys of this recipe are explained in Vazquez-Prada et al. (2002), where P{n) ioi N = 2 is explicitly obtained 
by mere inspection. The result is 

P{n)^^,N^2. (1) 

The explicit form of P{n) for larger values of A^, can be achieved by exploiting the Jordan decomposition of M', 

M' = QJQ-\ (2) 

and hence, 



M'"-i = Q J"-i Q \ (3) 
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FIG. 1: ^) Probability of occurrence of earthquakes of size k. Note that the simulations corresponding to TV equal to 10, 
100, and 1000 are superimposed. ^) For A'^ = 20, the probability of return of the characteristic earthquake as a function of 
the time elapsed since the last event, n. ^) Time evolution of the state of occupation in a system of size A'^ = 100. Note that 
after each characteristic event that completely depletes the system, there follows the corresponding recovery up to a high level 
of occupancy, and then the system typically presents a plateau previous to the next characteristic earthquake. 
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The matrix J is formed by "Jordan blocks" in the diagonal positions, i.e., by square matrices whose elements are 
zero except for those on the principal diagonal, which are all equal, and those on the first superdiagonal, which are 
equal to unity. Thus, the task of obtaining an arbitrary power of J is simple because, as said, each Jordan block is 
the sum of two conmuting matrices: one is a constant times the unity matrix, and the other is nilpotent. Therefore, 
in the computation of any arbitrary power of J, each block is independent and the corresponding Newton bynomial 
formula can be applied. As an example, we now present the calculation of the case iV = 3. In this case. 




K n 1 1 h (4) 

which is decomposed as 



Therefore, 



and from eq. © 
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One could optimistically guess that P{n), for an arbitrary N, can be deduced from the systematics observed in the 
previous low-TV cases. This is disproved by the following formula, which is the result of P{n) for = 4. 
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(9) 



Although it is not apparent, this formula, as it should, vanishes for n — 3. As in eqs. and (jSJl, P{n) in eq. 
is adequately normalized: 



(10) 
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IV. ERROR DIAGRAM FOR THE FORECASTING OF THE CHARACTERISTIC EARTHQUAKE 

In the following paragraphs, we will stick to a model of size = 20 to make the pertinent comparisons. This size 
is big enough for our purposes here, and small enough to obtain good statistics in the simulations. 
For n = 20 the mean value of P(n) is 



oo 

< 71 >= E = 121.05, 

1=20 



(11) 
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the standard deviation is 



1/2 



55.21, (12) 



and the skew of the distribution is 



.1=20 



^ = ^ E ^(*) >)' = -O-IO- (13) 

i=20 



Now we enter into the matter of forecasting. As in any optimization strategy, we wiU try to achieve simultaneously 
the most in a property called A and the least in a property called B, these two purposes being contradictory in 
themselves. Here A is the (successful) forecast of the characteristic earthquakes produced in the system. Our desire 
is to forecast as many as possible, or ideally, all of them. B is the total amount of time that the earthquake alarm is 
switched on during the forecasting process. As is obvious, our desire would be that this time were a minimum. The 
maximization of A is equivalent to the minimization of an A' that represents the fraction of unsuccessful forecasts. 

Thus, in practice, our goal in this paper is to obtain simultaneously a minimum value for the two following functions, 
fe and fa- The first represents the fraction of unsuccessful forecasts, or fraction of failures; the second represents the 
fraction of alarm time. These two functions, in this first one-dimensional strategy of forecasting, are dependent only 
on the value of n, that is, the time elapsed since the last main event, and to which the alarm is connected. Using the 
function P{n) previously defined, they read as follows: 



/.(n) = J2 P<"'). (14) 



These two functions are plotted in Fig.|2jL. By eliminating n between fe{n) and fain), we obtain Fig. [2l3, which is 
the standard form of representing the so-called error diagram. The diagonal straight line would represent the result 
of a random forecasting strategy. The curved line is the result of this model. 

Error diagrams were introduced in earthquake forecasting by Molchan who contributed with rigorous mathematical 
analysis to the optimization of the earthquake prediction strategies (Molchan, 1997). In his papers Molchan used r 
and n to represent the alarm fraction and the error fraction respectively; and put r in the horizontal axis. 

To fix ideas, it is convenient to define a so-called loss function, L, which expresses the trade-off between costs and 
benefits in the forecasting (Keilis-Borok, 2002). Among all the possible loss functions, we will choose the simple linear 
function 

L^fa + fe- (16) 

L{n) is also drawn if Fig|2^. The position Ua = 66 provides the minimum value of L{n). L{na) = 0.578. Note that 
Ua does not coincide either with the n that maximizes P(n), or with <n>. 

V. IMPROVING THE FORECASTS 

In Section IIVI we adopted the strategy of connecting the alarm at a fixed time, n, after the occurrence of a 
characteristic event. The evaluation of this strategy leads to the conclusion that for n = Ua = 66, the loss function 
has a minimum value L{na) — 0.578. The question now is: Can we think up other strategies that render better 
results? To answer this question, we now return to our previous comments on Fig. ^ 

If we define a medium-size earthquake as an event with a size between A^/2 and — 1, i.e. N > k > N/2, 
by observing the graphs in Fig. ^ one is led to the conclusion that in this model the occurrence of a medium-size 
earthquake is not frequent but when it actually takes place, the time of return of the characteristic quake in that 
cycle is increased. 

This qualitative perception can be substantiated by numerically obtaining the probability of having cycles where 
no medium-size earthquake occurs, i.e., k < N/2. This information is completed by the distribution of cycles where 
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FIG. 2: For N — 20, a) Fraction of failures to predict, /e, fraction of alarm time, /„, and loss function L = + /e as a function 
of n. b) Error diagram for characteristic event forecasts based on n. The diagonal line would correspond to a random strategy. 

the condition N > k > N/2 does occur. These two distributions are shown in Fig. as hnes b and c respectively. 
Here, line a represents the total distribution of the times of return of the characteristic earthquake in this model (the 
same as plotted in Fig. ^). Note that, as it should, the distribution a covers both distributions b and c. The mean 
time < n > for the three distributions is < rt >a= 121.05, < n >b= 107.57 and < n >c= 166.84. The fraction of 
cycles under b is 0.77 and the fraction under c is 0.23. A splitting of this type, in which the a distribution separates 
into b and c, will be denoted henceforth as a = 6 © c. 

To check if a = 5 © c is potentially useful for our purposes, we will now analyze independently these two sets of 
cycles, b and c, with the method used in Section Hvl for curve a. The result is the following: the best working n for 
dealing with the cycles under distribution 6 is n;, = 60. And with respect to the cycles belonging to the distribution 
under c, the best n is ric — 124. 

Therefore, we will now study again the whole set of cycles, i.e. those under a, by means of a retarding strategy, 
which is based on the splitting a = 6 © c. We will adopt the following steps: in any cycle, we will wait until an n, 
named rireti (which is near to nb), before taking any decision. If no medium earthquake has occurred so far, then the 
alarm is connected at Ureti- If, on the contrary, a medium quake has occurred before Ureti, then we move the alarm 
to nret2 (which is close to ric). This notation rireti and nret2 conies from the retarding strategy that we are exploring 
now. This two-dimensional strategy is implemented by varying rireti and nret2 looking for the best value of L. This 
is illustrated in Fig. The best option is rireti = 61 and nret2 = 101, with i(nreti, ^^6*2) = 0.549. 

Now we look for a similar property that can classify the cycles from another point of view. This new property 
consists in identifying the cycles where the sum of the sizes of all the earthquakes before the characteristic one is less 
than N/2. This condition will be represented by SUM < N/2. The reason for this choice is that if SUM < N/2, 
the system, statistically speaking, tends to reach more rapidly the configurations of maximum occupancy, j — N ~ 2 
and j = N — 1, and the time of return of the characteristic quake in that cycle tends to be smaller (see Fig. ^) . In 
Fig. Ob, line a represents, as in Fig. Ot, the distribution of return intervals of the characteristic earthquake for all the 
cycles of the model. And lines / and g represent, respectively, the separation of line a according to the fulfilment, or 
not, of the SUM < N/2 condition, a = f ® g. The mean value of the / and g distributions is < n >/= 88.78 and 
< n >g— 151.69 respectively. The fraction of events under the / and g hnes is 36.96 and 63.04 respectively. 

This second splitting of the whole set of cycles in the model, a = / © g, can be used as an advancing strategy in 
parallel to what we did with the retarding strategy. Thus the independent analysis of curve / leads to = 60, and 
the similar analysis of curve g leads to rig =^ 90. 

We will now study again the whole set of cycles (under a) by means of the advancing strategy, which is based on 
the splitting a — f (B g. Therefore, wc proceed as follows: In any cycle, we wait until Uadvi (which is close to rif) 
before taking any decision. If the condition SUM < N/2 has been fulfilled, then the alarm is connected at n — riadvi- 
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FIG. 3; 1^) For — 20. Line a is the distribution of return times of the characteristic earthquake as a function of the time 
elapsed since the last event, n. Line b corresponds to the distribution of cycles where no medium-size earthquake occurs. Line 
c corresponds to cycles with medium-size earthquakes. Curves 6 and c constitute the splitting of curve a according to whether 
this retarding effect is fulfilled or not. |3>) Lines / and g, represent the separation of the a distribution according to whether 
the advancing effect is fulfilled or not. 



If, on the contrary, this condition has not been fulfilled, then we move the alarm to nadv2 (which is close to Ug). This 
two-dimensional strategy is implemented by varying Uadvi and nadv2 looking for the lowest L. This is illustrated in 
Fig. 2J3. The search for the best option leads to Uadvi = 61 and nadv2 — 90, with L{nadvi,nadv2) — 0.537. This value 
of L is slightly better than that obtained using the retarding strategy. 

Inspired by these results, we will now analize the possibilities of a mixed strategy which contains conceptual elements 
of the two partial strategies discussed so far. Here we will explore a 3-dimensional grid of points (ni, n2, n^) looking for 
the minimization of L. The first coordinate, ni,will be explored in the neighbourhood of riadwi, the second coordinate 
n2 in the neighbourhood of nadv2, and finally near nret2- The two succesive key decisions to be taken are: 

i In any cycle, we wait until n — ni. If SUM < N/2 IS fulfilled, we connect the alarm at ni and leave it there. 
If at n = ni, SUM < N/2 is NOT fulfilled, we move the alarm to n2. And, 

ii (We are now at 712). If no medium-size event has occurred between ni and ri2, we leave the alarm connected at 
7X2. If, on the contrary, one or more medium-size events have occurred in this interval, then we move the alarm 
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FIG. 4: N = 20. Results of the two-dimensional strategy based on the splitting of curve a according to whether 

the retarding effect is fulfilled or not. Values of L varying n,.eti and nret2- Minimum value of L = 0.549 for nreti = 61 and 
nret2 = 101. EJ)) Results of the two-dimensional strategy based on the splitting of curve a according to whether the advancing 
effect is fulfilled or not. Values of L varying Uadvi and nadv2- Minimum value of L = 0.537 for Uadvi = 61 and nadv2 ~ 90. 



to n^. 

The search for the triplet (71,1,712,713) that makes L minimum is illustrated in Fig. |S1 The result corresponds to 
{ill — 61, 7i2 — 84, — 104), and there, L = 0.528. This is the best result obtained in this work. 
Thus, the improvement obtained in L, when passing from L{na) to L(7t,1, n2, 7x3) is around ~ 8.5%. 



VI. CONCLUSIONS 



In this paper, we have analyzed the behaviour of the minimalist model in relation to a quantitative assesment of 
the forecasting of its successive characteristic earthquakes. We have chosen a simple loss function, L = fa + fe- Our 
first try, based on a one-dimensional search in n, produces a minimum result of L around 0.578. This was illustrated 
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FIG. 5: For — 20. Illustration of the three-dimensional strategy. For ni =61, L-constant level-curves are plotted. The 
minimum value of L is 0.528 for ni = 61, 712 ~ 84 and na = 104. 

in Fig. With the aim of improving the forecasts, we then explored two modes of a common strategy that divides 
the probabihty distribution of the time of return of the characteristic earthquake into two distinct distributions. The 
first mode consists in using the occurrence of intermediate-magnitude earthquakes as a sign that the characteristic 
earthquake would Ukely return at a time later than usual in that cycle. This is based on the fact that medium-size 
events significantly deplete the load in the system and its recovery induces a retardation. This effect takes place in 
any system of the sand-pile type. The exploitation of this idea leads to a two-dimensional search that finally renders 
an L value around 0.549. (Fig. The second idea consists in using the fact that a significant absence of small 

earthquakes during a sizeable lapse of time in the cycle is a sign of imminence of the next characteristic event, or 
at least of a shortening of its period of return. This strategy is similar to the old wisdom in seismology that links a 
steady absence of earthquakes in a fault with the increase in the risk of occurrence of a big event. The exploration of 
this idea proceeds similarly to what we did with the retarding strategy: this also leads to a two-dimensional search. 
It renders a minimum L around 0.537. (Fig0lD). 

Finally, a mixed strategy that tries to incorporate the information acquired is implemented by means of a three- 
dimensional search, and provides a value of L = 0.528. The identification of the three optimum parameters is 
illustrated in Fig. [S| 

It is important to remark that the information we have used in our forecasts is based only in the observed systematics 
of earthquake occurrence in the model, i.e., only seismicity functions have been used. Thus, for example, in Section 
0we have not used the state of occupancy of the system j, which would have given much more accurate predictions. 
In real life, the use of this information would be equivalent to knowing, in real time, the value of the stress level and 
the failure threshold at any point in a fault. 
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